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^3 ■ A widely studied model for generating sequences is to "evolve" 



them on a tree according to a symmetric Markov process. We prove 
that model trees tend to be maximally "far apart" in terms of varia- 
tional distance. 



1. Introduction. In this paper we investigate sequences that have been 
generated on the tree by a simple Markov model. Such processes are widely- 
studied in molecular genetics, and in other areas of applied probability (in- 
cluding broadcasting and statistical physics). More precisely, we study the 
separation — as measured by variational distance — of the probability distri- 
bution on sequence patterns generated by different trees. We find that a 
large tree generates a probability distribution that is typically at maximal 
distance from that generated by nearly all other trees. 
] To describe our results more precisely, we first provide some terminology 

concerning trees and random processes on them. In a tree, vertices of degree 1 
are called leaves, as opposed to internal vertices. A tree is binary if all 
\q [ vertices have degree 1 or 3. Consider a set X of labels. A phylogenetic X- 

<^ ' tree is a tree in which leaves are identified with elements of X. (We do not 

r-j ■ require phylogenetic A-trees to be binary by definition for technical reasons, 

as we will have to deal with subtrees of phylogenetic X-trees.) We will regard 
two phylogenetic A-trees as being identical if there is a graph isomorphism 
between them, which, in addition, if restricted to A, is the identity function 
of X. If \X\ = n, then the number of different binary phylogenetic X-trees 
^ ■ is (2n — 5)!! [= 1 x 3 x 5 x • • • x (2n — 5)] [16]. For a phylogenetic A-tree 
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T, let [T] denote the corresponding unlabeled tree. The distance dr(u,v) 
between two vertices, u,v in a tree T is the number of edges on the unique 
path connecting them. 

We now describe a model for the evolution of binary sequences on a 
tree. This model has been described by various authors (and in a range 
of disciplines, including molecular biology, information theory and physics; 
for references, see [8, 16]). Here we refer to this model as the CFN model 
(short for Cavender-Farris-Neyman model); it has also been referred to in 
the literature as the "symmetric binary channel" and the "symmetric 2-state 
Poisson model." The CFN model provides a simple model for the evolution 
of purine-pyrimidine sequences. The significance of this simple model is that 
phenomena shown for the CFN model often extends to more realistic models 
of sequence evolution, and we will describe how our main results concerning 
the CFN model generalize. The term CFN tree will refer to a phylogenetic 
X-tree equipped with a CFN model. 

Suppose we have two states, and 1, and a phylogenetic X-tree T. The 
CFN model assigns probabilities to the patterns of state of the elements of 
X as follows. Let us associate a number p e (0 < p e < 1/2) with the edge e 
called the transition probability. Let £ e denote a random indicator variable 
associated to edge e with P[£ e = l]=p e , and assume the £ e 's are independent. 
Fix any vertex v and assign state or 1 to v with equal probability 1/2. 
Note that, for every vertex u of T, there is a unique path denoted path(u,v) 
in T and so we may define 

(1.1) state (u) = state (v) + £ e mod2. 

e£path(u,v) 

This gives a (joint) probability distribution on the set of all assignment of 
states (0 or 1) to the vertices of T, and thereby a marginal distribution on 
state assignments to the leaves of T — we call each such assignment \ '■ X — > 
{0, 1} a (state) pattern, and we let V x denote the probability of generating 
X under this model. 

The CFN model is thus specified by the pair (Tj'P), where V is the map 
that associated to each edge e its transition probability p(e). We refer to T 
as a CFN tree and V as a transition mechanism. 

The probability p that the endpoints of a path uw in a CFN tree T are 
in different states is nicely related to the transition probabilities of edges of 
the uw-p&th: 

(1.2) v=\[i- n (1-2&A 

Formula (1.2) is well known and is easy to prove by induction. Formula 
(1.2) also shows that the transition probability of a path is not less than 
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the largest transition probability on its edges. It is well known [18] that (1) 
changing the location of v in 7", or (2) substituting a path by a single edge in 
a CFN tree, and assigning to the new edge a transition probability according 
to (1.2) does not change the probability distribution of patterns. 

Usually k independent experiments are made to generate random pat- 
terns from a binary CFN tree 7", they are called sites. The (abstract) phy- 
togeny reconstruction problem is the following: from the observed pattern 
frequencies, determine, with a prescribed probability, what was the under- 
lying binary phylogenetic X-tree. We have shown in [6] that if \X\ = n and 
n — > oo, then k = $7(logn) sites are needed to return the true underlying 
tree with probability at least \ + e with either a deterministic algorithm or 
with a randomized algorithm whose random bits are independent from the 
random events on the CFN tree. Sequence length requirements for accurate 
tree reconstruction is not only of mathematical interest, but also a topical 
issue in molecular systematics (e.g., [3, 15]). We showed in [6] that, for fixed 
< / < g < 1/2, / <p e < g, and n — > oo, phylogeny reconstruction is possi- 
ble for all model trees, when k is a certain polynomial of n; is possible for 
some model trees, when A: is a logarithmic function of n; and is possible for 
almost all model trees, either in the uniform random binary X-tree model or 
in the Yule-Harding model, when A; is a certain polylogarithmic function of 
n. More recent work by Mossel and colleagues [5, 12] has established further 
instances for which logarithmic dependence of k on n suffices for accurate 
tree reconstruction and cases for which polynomial dependence is necessary. 

In this paper we show asymptotic results. The theorems are about n-leaf 
trees, but their conclusions are o(l) (limit) relations as n — ► oo. The un- 
derstanding is that, for a sequence of n-leaf trees satisfying the hypotheses, 
the limit relation holds. It would be technically more proper to speak about 
sequences of trees in the statements of the theorems, but we follow the tradi- 
tion of random graph theory [1, 4] not speaking explicitly about sequences. 
With the exception of Section 4, we study problems where the bounds on p e 
are fixed, and we let n — > oo. In Section 4 we show that many of the results 
generalize if dependence of the bounds on n is allowed but limited. 

2. Results. Let us be given two binary phylogenetic X-trees 71,72 with 
CFN transition mechanism V\ and V2 , respectively. The variational distance 
of their pattern distributions is 

(2.1) vardist((T 1 ,P 1 ), (T 2 ,V 2 ))= £ \(Vi) x - (P 2 ) x |- 

x 

This distance lies between and 2, and in Theorem 3.1 we show that almost 
all binary trees are maximally distant (in terms of variational distance) from 
any given binary tree with a given CFN transition mechanism, under mild 
assumptions on their transition mechanisms. A practitioner may argue that 
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Theorem 3.1 has limited relevance, since the uniform distribution of trees is 
just one particular prior distribution on trees, and the CFN model is very 
particular. However, the conclusion of Theorem 3.1 holds not just for the 
counting measure, but for all permutation invariant measures on phyloge- 
netic X-trees; moreover, it holds for more general, and for the applications 
more realistic classes of transition mechanisms (Theorem 4.1). This result 
may not be surprising: as we equip randomly selected trees with CFN mod- 
els, they have many local statistics that are essentially independent and have 
different marginals in the two trees. Therefore, analogously to the Kakutani 
dichotomy, their measures are expected to be (near) orthogonal. 

Farach and Kannan [2, 9, 10] designed an algorithm for phylogeny recon- 
struction based on convergence to the true tree in variational distance and 
suggested to pay more attention to the variational distance in phylogeny 
reconstruction. Some support for the utility of this metric is provided by re- 
sults that we present in Sections 3 and 4: if we get just close to a model tree 
in variational distance, then we already excluded most of the false candidates 
for the phylogenetic tree. 

However, a simple fact provides a sharp contrast to the results mentioned 
above. Note that in practice we estimate the model distribution of patterns 
by the observed frequency of patterns. For sub- exponential sequence length, 
which is known to be sufficient for phylogeny reconstruction with probabil- 
ity 1 — o(l) as < / < g < 1/2 fixed and / < p e < g, as n — > oo (see the 
discussion in Section 1), the variational distance between the model pattern 
distribution and the observed pattern distribution is near 2 with probability 
1 — o(l). (For details, see our technical report [19].) 

In other words, phylogeny reconstruction is well possible without conver- 
gence of the observed pattern distribution to the model pattern distribution 
in variational distance. 

Therefore, the accuracy of tree reconstruction cannot be captured by vari- 
ational distance alone. This conclusion was suggested by [7] and [14], though 
with less explicit theoretical justification. 

3. Variational distance of CFN trees is typically large. 

Theorem 3.1. Fix < / and g < 1/2. There exists a function e(n) = 
£ f,g( n ) = as n ~ > °°> such that, for every binary phylogenetic X-tree T\ 
with CFN transition mechanism V\ where p e < g inVi, the following holds: 
For almost all [i.e., (1 — o(l))(2n — 5)!! in number] binary phylogenetic X- 
trees T2, equipped with an arbitrary transition mechanism V2, where f <p e 
in V2, we have 



(3.1) 



var dist ( (Ti , Vx ) , (T 2 , Vi) ) > 2 - e{n) . 
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Fig. 1. Ending of a longest path in a binary tree. 

The proof requires a number of lemmas, which we now state. 

Lemma 3.2. For every binary phylogenetic X-tree T on n > 4 leaves, 
there are at least n/4 disjoint pairs of leaves a^bi, such that, for every i: 

(i) Gtj and b{ are separated by a distance of 2 or 3; 

(ii) for i j, the cubi and the ajbj paths in T are edge disjoint. 

Proof. The claim is true for 4 < n < 8, since then any longest path 
ends in two disjoint cherries. This is the basis for an induction proof on n. 
It is easy to see that, for n > 9, there exists a longest path in T, for which 
one end must be a leaf in a cherry that lies at the top portion of the tree 
given by one of the four cases shown in Figure 1 (the other end of the path 
lies in the bottom part of the tree, represented by a circle). In each of the 
four cases truncate the tree as indicated by the dashed curve to obtain 
T[. For i = 1, 2, 3, 4, T[ has n — 2 (resp. n — 2, n — 3, n — 4) leaves, and the 
induction hypothesis applies to T[. In all four cases it is easy to add two new 
close vertex pairs to create the required set of them for Tj, while destroying 
at most one which pre-existed in T[. □ 

Remark 3.3. As Figure 2 shows, the conclusion of Lemma 3.2 is essen- 
tially the best possible. 

Lemma 3.4 (Tree-chopping lemma, [17], Lemma 3). Let T be an arbi- 
trary binary X-tree and q > 2 integer. Then edges can be deleted from T 
such that a forest results with the following properties: 
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(i) The number of leaves from X in any tree of the forest is at most 
2q-2. 

(ii) The number of leaves from X in any tree of the forest is at least q, 
except possibly for one tree. (We shall call this exceptional tree degenerate. ) 

Recall the Azuma-Hoeffding inequality (see [1]): 

Lemma 3.5. Suppose X = (X\ , X2, ■ ■ ■ , Xk) are independent random vari- 
ables taking values in any set S, and L: S k — >> M is any function that satisfies 
the condition: \L(u) — L(v)\ < t whenever u and v differ at just one coordi- 
nate. Then, 

(3.2) P[|L(X) -E[L(X)]| > A] < 2exp(-^Q. 

The following lemma is obvious. 

Lemma 3.6. Let T denote a fixed phylogenetic X-tree, with \X\ = n, 
and let r = [T] (the corresponding unlabeled tree). Let ir be a randomly se- 
lected permutation of X under the uniform distribution. Let ir{J-) denote the 
phylogenetic X-tree that we obtain from T by changing all leaf labels from v 
to ir(v) simultaneously. Then it{!F) represents a random uniform selection 
from those binary phylogenetic X -trees whose underlying unlabeled tree is r. 

From now on, for notational convenience, we pretend that 4 divides n. 

Lemma 3.7. For an X with \X\ =n, and n/4 disjoint ai,bi ordered pairs 
from X , there exist functions m(n) — > 00, h(n) — > 00 and g(n) — > 00, such 
that the following holds. For every unlabeled binary tree r with n leaves, 
for all but a fraction of binary phylogenetic X -trees T with property 
[T] = t, there is an index set L such that \L\ = m(n) and: 



Fig. 2. Binary tree on At + 9 leaves, with only t + 3 close leaf pairs. 
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(i) d-r(ai,bi) > h(n) for all i € /; and 

(ii) for i,j El , i 7^ j, pathq-{ai,bi) and pathq-{aj,bj) are edge disjoint. 

Proof. Let T denote a fixed binary phylogenetic X-tree such that \T\ = 
r, with \X\ = n. Apply Lemma 3.4 to T with q = [log 2 n] . Let L\, L2, . . . , L s 
denote the leaf sets that the nondegenerate trees contain from X. From 
the lemma, q < |L.;| < 2q — 2, and at most q — 1 elements of X are not in 
some Li. Let ir be a randomly selected permutation of X under the uniform 
distribution. Let vr(jF) denote the phylogenetic X-tree that we obtain from 
T by changing all leaf labels from v to tt(v) simultaneously. According to 
Lemma 3.6, ir(J-) represents a random uniform selection from those binary 
phylogenetic X-trees whose underlying unlabeled tree is r. The previous 
application of Lemma 3.4 still partitions tt(J-), the leaf sets of the non- 
degenerate trees intersect X in tt{L\), 7r(L2), . . . , tt(L s ), and we still have 
q < |vr(Lj)| <2q — 2. Therefore, for i 7^ j, if path l w ^ (resp., path 3 ,j.C) con- 
nects an arbitrary pair of vertices of Li (resp., Lj) in the tree tt(J-), then 

(3.3) path i\(f} is edge disjoint from path^^y 

Set h(n) = log log n and m(n) = — \ . Observe from Lemma 3.4 and the 
choice of q that n < (s + l)(2q — 2), and therefore, 

(3.4) m(n)<|. 

We are going to find an appropriate g(n) for this choice. We call a leaf 
set Y C X infected, if there is a 1 < j < n/4, such that both aj,bj € Y. Let 
E denote the event that, for our fixed r and J 7 , ^(^) has the property 
that for all j = 1,2, ...,s, ff(Lj) is infected; and let F denote the event 
that, in addition to E, for at least half of the indices j = 1,2, . . . ,s, one 
finds some ij, such that both G n(Lj) [i.e., they do infect vr(Lj)] and 

d^/jr^ai^b^) > h(n). In view of (3.3), the o^,6^ paths in 7r(.F) are pairwise 
edge disjoint for j = 1, 2, . . . , s. 
Observe that 



(3.5) IPW-kj) not infected] 



2^ u =0 \ u ) L \\Lj\-J 



[A noninfected Lj can have zero or one element from every (ai,bi) pair, 
for i = 1, 2, . . . , n/4. The case analysis is based on the number u = \ir(Lj) n 

{en, h : i = 1, 2, . . . , n/4}\ . There are ( n ^ 4 ) to select a subset of u indices from 
{1,2, . . . , n/4}, and then 2 U ways to tell if or 6j selected for the particular 
index set into Lj. There are (i^i^y) ways to make Lj complete using \Lj\—u 
elements not belonging to {aj, 6j : i = 1, 2, . . . , n/4}.] 
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Comparison of consecutive terms show that the largest term in the nu- 
merator of the RHS of (3.5) is u = \Lj\. Using the usual notation (x) m for 
the mth falling factorial, it follows that 

(1^-1 + l)2l^l 
(3.6) P[7r(Lj) not infected] < m ^ 



(3.7) 



j 

n \ L j\ 



- A\ L i\{n-\Lj\)\ L i\ Vl Jl ; 



(3.8) 



<(l + o(l))2-l^l(|L,| + l) 

< (1 + o(l))2- q (2q - 1) < 2^ a011 °s 2n , 
and from (3.6)-(3.8), 



(3.9) P[3 jivr^) not infected] < IV aollog2n . 
By (3.9), we showed that 

(3.10) F[E] > 1 -n2- a011 °s 2n . 

Call the ordered s-tuple of pairwise disjoint sets Y\, Y2, . . . , Y s C X feasible, if 
\Yi \ = \Li\ and Yi is infected for i = 1,2, ... ,s. Now we turn to the conditional 
probability P^E 1 ]. Observe 

(3.11) F[F\E] = J2 F[F\Vi:n(L i )=Y i \F[Vi:7r(L i )=Y i \ 

Yi,Y2,...,Y s feasible 

(3.12) < max F[F\Vi:ir(L i ) = Y i \. 

Yi,Y2,...,Ys feasible 

Assume now that an arbitrary feasible Y\, Y2, . . . , Y s is fixed. A ir that satisfies 
the condition in (3.12) is nothing else but the juxtaposition of m'.i — > Yi 
bijections for i = 1, 2, . . . , s + 1. Therefore, a uniform random 7r satisfying 
the condition in (3.12) can be realized by a sequence of independent uniform 
random choices of bijections 7Tj from Lj to Yi, i = 1, 2, . . . , s + 1. 

Let 7Tj : Lj — ► 5^ denote a uniform random bijection for i = 1, 2, . . . , s + 1. 
Conditional on E, for every z = 1, 2, . . . , s, fix an aj . , by leaf pair that infects 
Y{. Observe that the conditional event 

F\Vi:n(L i ) = Y i 

is implied, if for at least half of the indices 1 < i < s, we have d^rp\ (a^ , ) > 
h(n). Also observe, that notwithstanding the notation d^^), this distance 
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depends only on the single tti under consideration. No matter what is the 
value of 7r~ 1 (ai j ), at most 2 h ^ n > vertices of Li can be closer than h{n) to 
7T i~ 1 ( a ij) * n t ne binary tree T . Those at most 2 h ^ vertices can be pre-images 
of hi, under 7Tj (and 7r as well), if ^(^(oi- , h-) < h(n). Therefore, 

^hln) 2l°gl°g n \ 

¥[d <T) { aij M,) > h(n)] > 1 - — = 1 - — -j— = 1 - - 2 _ . 

log n log s re 

Hence, a lower bound for P[_F|J5] is the probability of at least s/2 successes in 
a sequence of s independent Bernoulli trials, each with probability of success 
V = 1 ~~ log 2-"iog2 n - Not having at least m(n) successes implies not having at 
least s/2 successes by (3.4), and probability of the latter event can easily 
be bounded from above by Lemma 3.5 (t = 1, k = s, A = s/3), as soon as 

log*-'°«*n < X / 6 ' by 

(3.13) 2e~ s/18 . 
Finally, using (3.10) and (3.13), we have 

1 _ f[F] = 1 - F[E] + F[E](1 - F[F\E]) 

(3.14) 

< n 2-°- 011 °s 2n + 2 e - n /( 641 °s 2n ) 5 
and since the RHS of (3.14) is o(n), we can take for g(n) its reciprocal. □ 



Proof of Theorem 3.1. Specify now re/4 leaf pairs {cii,bi} of 71 ac- 
cording to Lemma 3.2 — for notational convenience, we assume again that n 
is a multiple of 4. We set rre(n), h(n), g(n) and I according to the statement 
of Lemma 3.7. We are going to show that, for every fixed (7i,Vi) and fixed 
unlabeled tree r, if [7^] = r and is not in the exceptional set of trees 
described in Lemma 3.7, then the variational distance between (Ti,Vi) and 
(72, V2) differs from 2 by at most a quantity that is o(l) as a function of re. 
Recall that state(x) denotes the state of leaf x £ X in a CFN tree. Consider 
the random indicator variable Zi, which is 1, if state{ai) = stateipi), and 
otherwise, and Z = J2 ieI Zi, which depends on the distribution of leaf col- 
orations of the CFN tree. We will speak about z\ X \ Z^ and zf \ Z^ as 
the CFN tree is {T\,V\) or {T2,V<i), and similarly about state\ and state2, 
and will drop the superscript if the argument applies to both. 

By the linearity of expectation, 

(3.15) E[Z] =5^E[Zi] = ^>[state(ai) = state (bi)}. 

iei iei 

In (71, Pi), we have ¥[statei(ai) / stateifo)] < |(1 - (1 - 2g) 3 ), by (1.2), 
and hence, 

(3.16) l-3g + 6g 2 -4g 3 <F[statei(ai) = state 1 (bi)}. 
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Formula (3.15) and inequality (3.16) imply that 

(3.17) E[Z (1) ] > (1 - 3g + 6g 2 - ig 3 )m(n) 
In {T2,V2), by a similar argument, we have 

(3.18) F[state 2 (ai) = state 2 (bi)] < 1 - ±(1 - (1 - 2/) fe(n) ) = ± + o(l) 
by (1.2), and /i(n) — > oo. By linearity (3.15), we have 

(3.19) E [Z( 2 )]<(1 + (1))^M. 

We are going to show that, with high probability, both and are very 
close to their respective expectations. This will be easy to show, since both 
of them are the sums of independent indicator variables. [Use Lemma 3.5 

for Xi = Z\ l) (resp. zf } ), k = m(n), t = 1, A = m(n) 2 / 3 .] 
It is easy to see that, for < g < 1/2, we have 

(3.20) l/2<l-3 5 + 6<? 2 -4 5 3 , 

and therefore, using (3.17) and (3.19), EfZ^ 1 )] and E[Z^ 2 )] are separated by a 
linear function of m(n), for example, l(n) = ^(1 — 3g + 6g 2 — 4g s + ^)m(n). 
Consider now the event H: ll Z > Z(n)." In (Ti,Vi), event H has probability 
1 — o(l), while in (T^,^)) the complement of event H has probability 1 — 
o(l). This implies that the variational distance of (Ti,Vi) and (T^T^) i s 
2-o(l). □ 

4. Variational distance in more general models. In this section we pro- 
vide a result (Theorem 4.1) that is a three- fold generalization of Theo- 
rem 3.1. The three extensions allow (i) more general probability distributions 
on trees ("permutation-invariant measures"), (ii) more general transition 
models than the CFN model ("conservative, separable processes") and (iii) 
a weakening of the constraints on the parameters of the model. 

Permutation-invariant measures on trees. Let us call a measure \i on the 
set of (2n — 5)!! binary phylogenetic X-trees permutation invariant, if for ev- 
ery 7r permutation of X and any phylogenetic X-tree T ', ^(J 7 ) = ^{^{T)). 
Note that Lemma 3.6 stated that the uniform distribution (or counting 
measure) on binary phylogenetic X-trees is permutation invariant. A prac- 
titioner may argue that Theorem 3.1 has limited relevance, since the uniform 
distribution of trees is just one particular prior distribution on trees. How- 
ever, any relevant distribution of trees is permutation invariant and it is easy 
to see that the stronger Theorem 3.1 holds with basically the same proof. 
A nonuniform, phylogenetically relevant permutation invariant distribution 
on phylogenetic X-trees is the unrooted Yule-Harding distribution [6]. 
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More general transition processes (conservative, separable processes) . The 
restriction of the CFN to two states and symmetric transition probabilities 
is convenient for description and proofs. However, much of the argument 
used in the proof of Theorem 3.1 can be generalized to models that are 
much closer to those used in modern molecular biology. We identify two key 
properties that are used in the proof, and that both apply to a range of 
substitution models. 

Suppose we have a set S of q > 2 states. A pattern will now refer to a 
state assignment function x '■ X — > S, where X is the leaf set of T. As- 
sume that we have a probability distribution on the patterns of a binary 
phylogenetic X-tree, where V x denotes the probability of pattern \- Se- 
lecting a random pattern according to the distribution, we can observe a 
random state of any particular leave. For a pair of leaves a, b, let E(a, b) 
be the event that state(a) = state(b). Let us be given a strictly decreasing 
function H : [0,oo) — ► (c, 1] with H(0) = 1, and a c > constant, such that 
linr^oo H(x) = c. We assume that H and c are fixed and do not depend on 
n. We say that a probability distribution on patterns is conservative if 

(C) there exists an assignment of t(e) > to each edge e of T, 
so that the following condition holds: For each pair a,b 6 X, 
F[E(a,b)]= H(Eee P ath(a,b)t(e)). 

The CFN model satisfies condition (C), as can easily be seen from (1.2) 
by taking t(e) = — ^log(l — 2p e ), H(x) = ^(1 + exp(— 2x)), and c = |. More 
generally, condition (C) is satisfied by any tree-based Markov process that 
can realized by a stationary, reversible, continuous-time Markov process op- 
erating on each edge e of T for a duration [corresponding to t(e)\ (this is 
Theorem 4(2) of [18]; for more details on such models, see [16]). 

Next, we say that a probability distribution on patterns is separable if it 
satisfies the following property: 

(SWhenever (ai, &i), (ai, 62)) • • • > ( a m> b m ) are pairs of leaves whose 

connecting paths are pairwise edge-disjoint, then {E(ai,bi),i = 1, . . . , m} 
are independent events. 

It is easily seen that the CFN model is separable. Moreover, any group-based 
model satisfies the separation condition (S) (Theorem 10 of [20], generalizing 
[11]); briefly, "group-based models" are defined in the same way as the CFN 
model, but over an arbitrary finite Abelian group, rather than the particular 
group ({0,1}, + mo d2) (for more details, see [16]). 

We will call a model that satisfies conditions (C) and (S) a conservative, 
separable process. Examples of such models include the CFN model, and, 
more generally, the symmetric q-state model, for which, when a transition 
occurs, one of the remaining states is selected uniformly at random. For 
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this model, we have c = - in condition (C), and this model is well known 
in a variety of fields, including physics, broadcasting and molecular biology, 
where it is referred to as the "q-state Potts model," the "g-ary symmetric 
channel," and the "Neyman g-state model," respectively (and, in the special 
case when q = 4, as the "Jukes-Cantor model"); for more details, see [13]. 
A further example of a conservative, separable process in molecular biology 
is the Kimura 3ST model (for details, see [16]). 

Weakened constraints. In Theorem 3.1 we imposed the condition / <p e 
for a fixed / > for the transition mechanism Vi ■ In fact, an inspection of 
the proof reveals that < / = f(n) may depend on n, as far as we have 
linirc-xxj h(n)f(n) = oo, where h(n) is any function satisfying the statement 
of Lemma 3.7. [The present proof of Lemma 3.7 allows f(n) — > "very 
slowly," but the truth is likely just "slowly."] 

The result allowing these three types of extensions is the following. 

Theorem 4.1. Fix < t + < oo, and allow t- = i_(n) > to vary with 
n if still linin^oo h(n)t_(n) = oo, where h(n) is any function satisfying the 
statement of Lemma 3.7. For every binary phylogenetic X-tree T\ with a 
conservative, separable process V\ where t(e) < t + in V\, and any \x per- 
mutation invariant measure on phylogenetic X -trees, the following holds for 
a function e(n) = o(l). The set of binary phylogenetic X -trees of measure 
1 — o(l) has the property that any of them equipped with an arbitrary con- 
servative, separable process V2, with t(e) > i_ in V2 (assuming V2 has the 
same H and c as V\) has 

(4.1) vardist((T 1 ,^ 1 ),(T 2 ,P 2 ))>2-e(n). 

Proof. We need a straightforward modification of the proof of Theo- 
rem 3.1. Leaving out the subscript from the notation for the generic leaf pair 
(fli,bi), formula (3.16) can be substituted by 

(4.2) H(3t+) < H(d Tl {a,b)t + ) <P[stotei(a) = state y (b)\; 
(3.18) can be substituted by 

(4.3) F[state 2 {a) = state 2 (b)] < H(d T2 {a,b)t„) < H(h(n)t-) <c + e 
for any fixed e > as n —* 00. For a sufficiently small e > 0, we have 

(4.4) c + e<H(3t+) 

[this follows from the assumptions on H and c], and thus, inequality (4.4) 
substitutes for (3.20). □ 
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